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Abstract 

We present a collection of new results on problems related to 3SUM, including; 

• The first truly subquadratic algorithm for 

- computing the (min,+) convolution for monotone increasing sequences with integer values 
bounded by Oin), 

- solving 3SUM for monotone sets in 2D with integer coordinates bounded by Oin), and 

- preprocessing a binary string for histogram indexing (also called jumbled indexing). 

The running time is polylogn) = with randomization, or 

deterministically. This greatly improves the previous time bound obtained from 

Williams’ recent result on all-pairs shortest paths [STOC’ 14], and answers an open question raised 
by several researchers studying the histogram indexing problem. 

• The first algorithm for histogram indexing for any constant alphabet size that achieves truly sub¬ 
quadratic preprocessing time and truly sublinear query time. 

• A truly subquadratic algorithm for integer 3SUM in the case when the given set can be partitioned 
into clusters each covered by an interval of length n, for any constant 5 > 0. 

• An algorithm to preprocess any set of n integers so that subsequently 3SUM on any given subset 
can be solved in Oiri^^!'^ polylog n) time. 

All these results are obtained by a surprising new technique, based on the Balog-Szemeredi-Gowers 
Theorem from additive combinatorics. 
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1 Introduction 


1.1 Motivation: Bounded Monotone (min,+) Convolution 

Our work touches on two of the most tantalizing open algorithmic questions: 

• Is there a truly subcubic (0(n^“'^)-time) algorithm for all-pairs shortest paths (APSP) in general 

dense edge-weighted graphs? If all the edge weights are small integers bounded by a constant, then 
the answer is known to be yes, using fast matrix multiplication 0^ . but the question remains open 
not only for arbitrary real weights, but even for integer weights in, say, [njQ The current best combi¬ 
natorial algorithms run in slightly subcubic 0{{n^/ log^ n){log log time |[T3ll^ . The recent 

breakthrough by Williams 13711 achieves expected time (using fast rectangular matrix 

multiplication). 

• Is there a truly subquadratic (0(n^“'^)-time) algorithm for the 3SUM problem? One way to state the 
problem (out of several equivalent ways) is: given sets A, 73, 5 of size n, decide whether there exists 
a triple {a,b, s) £ A x B x S such that a -|- 6 = s; in other words, decide whether (A -|- 73) n S / 0. 
All 3SUM algorithms we are aware of actually solve a slight extension which we will call the 3SUM~^ 
problem: decide for every element s £ S whether a -I- 6 = s for some (a, 6) G A x 73; in other words, 
report all elements in (A -I- 73) n S. If A,B,S C [cn], then the problem can be solved in 0{cn log n) 
time by fast Fourier transform (FFT), since it reduces to convolution for 0-1 sequences of length cn. 
However, the question remains open for general real values, or just integers from [n]^. The myriad 
“3SUM-hardness” results showing reductions from both real and integer 3SUM to different problems 
about computational geometry, graphs, and strings 1161 [30l H llU |35l [H HI |23l HU tacitly assume 
that the answer could be no. The current best algorithm for integer 3SUM or 3SUM+ by Baran et 
al. Q runs in slightly subquadratic 0{(r? j log^ u,)(log log n)^) expected time. Grpnlund and Pettie’s 
recent breakthrough for general real 3SUM or 3SUM+ ll20l achieves 0((n^/ log^'^^ u,)(log log n)^/^) 
deterministic time or 0((n^/ log n)(log log n)^) expected time. 

Our starting point concerns one of the most basic special cases—of both problems simultaneously—for 
which finding a truly subquadratic algorithm has remained open. Put another way, solving the problem 
below is a prerequisite towards solving APSP in truly subcubic or 3SUM'^ in truly subquadratic time: 

The Bounded Monotone (niin,-i-) Convolution Problem: Given two monotone increasing sequences 
oo, ..., an-i and bo,..., bn-i lying in [0(n)], compute their (min,-\-) convolution sq,. .., S2n-2, 
where Sk = + bk-i). 

If all the Oj’s and 6j’s are small integers bounded by c, then (min,-i-) convolution can be reduced to 
classical convolution and can thus be computed in 0(cn log n) time by FFT. If the differences Oj+i — ai 
and — bi are randomly chosen from {0,1}, then we can subtract a linear function i/2 from a* and 
bi to ret sequences lying in a smaller range [0{-s/n)] and thus solve the problem by FFT in 0(n^/^) time 
w.h.po However, these observations do not seem to help in obtaining truly subquadratic worst-case time for 
arbitrary bounded monotone sequences. 

We reveal the connection to APSP and 3SUM+: 

' [n] denotes {0,1,..., n — 1}. 

^ The O notation hides polylogarithmic factors; “w.h.p.” means “with high probability”, i.e., with probability at least 1 — l/rf 
for input size n and an arbitrarily large constant c. 


1 




• A simple argument ifTOll shows that (min,+) convolution can be reduced to (min, +) matrix multipli¬ 
cation, which in turn is known to be equivalent to APSP. More precisely, if we can compute the 
(min,+) matrix multiplication of two n x n matrices, or solve APSP, in T(n) time, then we can com¬ 
pute the (min,-i-) convolution of two sequences of length n in 0{^/nT{^/n)) time. The APSP result 
by Williams immediately leads to an n^/2^('^^^^)-time algorithm for (min,-i-) convolution, the best 
result known to date. The challenge is to see if the bounded monotone case can be solved more 
quickly. 

• Alternatively, we observe that the bounded monotone (min,-i-) convolution problem can be reduced 

to 3SUM+ for integer point sets in 2D, with at most a logarithmic-factor slowdown, by setting A = 
{(i,a) : Qi < a < Oj+i} and B = {{i,b) : h < b < in [0(n)]^, and using O(logn) 

appropriately chosen sets S via a simultaneous binary search for all the minima (see Section 13.21 for 
the details). Two-dimensional 3SUM+ in [0{n)]‘^ can be easily reduced to one-dimensional 3SUM+ 
in [0{v?)]. The current best result for integer 3SUM’'’ leads to worse bounds, but the above reduction 
requires only a special case of 3SUM''', when the points in each of the 2D sets A, B, S in [0(n)]^ form 
a monotone increasing sequence in both coordinates simultaneously. The hope is that the 3SUM+ in 
this 2D monotone case can be solved more quickly. 

The bounded monotone (min,-i-) convolution problem has a number of different natural formulations and 
applications: 

• Computing the (min,-i-) convolution for two integer sequences in the bounded differences case, where 
|aj+i — Oil, \ bi-^-l — bi\ < c for some constant c, can be reduced to the bounded monotone case by just 
adding a linear function ci to both a* and 6*. (The two cases turn out to be equivalent; see Remark iTSl l 

• Our original motivation concerns histogram indexing {a.k.a. jumbled indexing) for a binary alphabet: 

the problem is to preprocess a string ci • • • G {0,1}*, so that we can decide whether there is a sub¬ 
string with exactly i O’s and j 1 ’s for any given i, j (or equivalently, with length k and exactly j 1 ’s for 
any given j, k). Histogram indexing has been studied in over a dozen papers in the string algorithms 
literature in the last several years, and the question of obtaining a truly subquadratic preprocessing 
algorithm in the binary alphabet case has been raised several times (e.g., see I[mi27ll2^ and the in¬ 
troduction of EJj for a more detailed survey). In the binary case, preprocessing amounts to computing 
the minimum number Sk (and similarly the maximum number sj.) of I’s over all length-/c substrings 
for every k. Setting a* to be the prefix sum ci Cj, we see that Sk = minjA^(aj — Oi^k)^ which is 

precisely a (min,-i-) convolution after negating and reversing the second sequence. The sequences are 
monotone increasing and lie in ±[n] (and incidentally also satisfy the bounded differences property). 
Thus, binary histogram indexing can be reduced to bounded monotone (min,-i-) convolution. (In fact, 
the two problems turn out to be equivalent; see Remark [3^ 1 

• In another formulation of the problem, we are given n integers in [0(n)] and want to find an interval of 
length i containing the smallest/largest number of elements, for every I G [0(n)]; or find an interval 
containing k elements with the smallest/largest length, for every k G [n]. This is easily seen to be 
equivalent to binary histogram indexing. 

• For yet another application, a “necklace alignment” problem studied by Bremner et al. iHTOll . when 
restricted to input sequences in [0(n)], can also be reduced to bounded monotone (min,-i-) convolution. 
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1.2 New Result 


We present the first truly subquadratic algorithm for bounded monotone (min,+) convolution, and thus for all 
its related applications such as binary histogram indexing. The randomized version of our algorithm runs in 
0(ni'859^ expected time; the curious-looking exponent is more precisely (9 + \/T77)/12. The deterministic 
version of the algorithm has a slightly worse running time. Our randomized algorithm uses FFT, 

while our deterministic algorithm uses both FFT and fast (rectangular) matrix multiplication. 

1.3 New Technique via Additive Combinatorics 

Even more interesting than the specific result is our solution, which surprisingly relies on tools from a 
different area: additive combinatorics. We explain how we are led to that direction. 

It is more convenient to consider the reformulation of the bounded (min,-i-) monotone convolution prob¬ 
lem, in terms of solving 3SUM+ over certain 2D monotone sets A, B, S in [0(n)]^, as mentioned earlier. 
A natural approach to get a truly subquadratic algorithm is via divide-and-conquer. For example, we can 
partition each input set into subsets by considering a grid of side length i and taking all the nonempty 
grid cells; because of monotonicity of the sets, there are Oinji') nonempty grid cells each containing Oif) 
points. For every pair of a nonempty grid cell of A and a nonempty grid cell of i3, if their sum lands in or 
near a nonempty grid cell of S, we need to recursively solve the problem for the subsets of points in these 
cells. “Usually”, not many pairs out of the 0((n/£)^) possible pairs would satisfy this condition and require 
recursive calls. However, there are exceptions; the most obvious case is when the nonempty grid cells of 
A,B,S all lie on or near a line. But in that case, we can subtract a linear function from the y-coordinates to 
make all y-values small integers, and then solve the problem by FFT directly! 

Thus, we seek some combinatorial theorem roughly stating that if many pairs of Ax B have sum in or 
near S, the sets A and B must be “special”, namely, close to a line. It turns out that the celebrated Balog- 
Szemeredi-Gowers Theorem (henceforth, the BSG Theorem) from additive combinatorics accomplishes 
exactly what we need. One version of the BSG Theorem (out of several different versions) states: 

Given sets A, B, S of size N in any abelian group such that |{(a, b) £ A x B : a + b £ S}\ > 
aN'^, we must have |A' -|- B'\ < 0{{l/a)^N) for some large subsets A' C A and B' G B with 
|A'|,|H'| > U(aA). 

(We will apply the theorem to the sets of nonempty grid cells in 7?, with N = 0{nlt}.) 

The original proof by Balog and Szemeredi 0 used heavy machinery, namely, the regularity lemma, 
and had a much weaker superexponential dependency on a. A much simpler proof with a polynomial a- 
dependency later appeared in a (small part of a famous) paper by Gowers lfT9ll . Balog [4] and Sudakov et 
al. Il3^ further refined the factor to the stated (1/a)®. Since then, the theorem has appeared in books li3^ 
and surveys |[2^[36l . Although additive combinatorics, and specifically the BSG Theorem, have found some 
applications in theoretical computer science before |[26l |36l (for example, in property testing f8]), we are 
not aware of any applications in classical algorithms—we believe this adds further interest to our work. 

Four points about the BSG Theorem statement are relevant to our algorithmic applications: 

• First, as it reveals, the right criterion of “special” is not that the two sets A and B are close to a 
line, but rather that their sumset A B has small size. According to another celebrated theorem 
from additive combinatorics, Freiman’s Theorem ifTSlI^ . if a sumset A -|- A has size 0(|A|), then A 
indeed has special structure in the sense that it must be close to a projection of a higher-dimensional 
lattice. Fortunately, we do not need this theorem (which requires a more complicated proof and has 
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superexponential a-dependency): if A + B has small size, we can actually compute A + Bhy FFT 
directly, as explained in the “FFT Lemma” of Section [2l 

• Second, the theorem does not state that A and B themselves must be special, but rather that we can 
extract large subsets A' and B' which are special. In our applications, we need to “cover” all possible 
pairs in {(a, 6) ^AxB: a + h^ 5}, and so we need a stronger version of the BSG Theorem 
which allows us to remove already covered pairs and iterate. Fortunately, Balog |4] and Szemeredi et 
al. ll^ provided a version of the BSG Theorem that did precisely this; see Section |2] for the precise 
statement. The resulting corollary on pairs covering is dubbed the “BSG Corollary” in Section |2l 

• The BSG Theorem was originally conceived with the setting of constant a in mind, but polynomial 
a-dependency (which fortunately we have) will be critical in obtaining truly subquadratic algorithms 
in our applications, as we need to choose a (and 1) to balance the contribution of the “usual” vs. 
“special” cases. 

• The BSG Theorem is originally a mathematical result, but the time complexity of the construction 
will matter in our applications. We present, to our knowledge, the first time bounds in Theorem l2.3l 

Once all the components involving the BSG Corollary and the FFT Lemma are in place, our main 
algorithm for bounded monotone (min,+) convolution can be described simply, as revealed in Section |3] 

1.4 Other Consequences of the New Technique 

The problem we have started with, bounded monotone (min,+) convolution, is just one of many applications 
that can be solved with this technique. We briefly list our other results: 

• We can solve 3SUM+ not only in the 2D monotone case, but also in the d-dimensional monotone case 

in truly subquadratic (d-ii)2+48(i)/i2^ expected time for any constant d (Theorem 13.111 . If 

only A and B are monotone, a slightly weaker bound ('^+^3)) still holds; if just A is monotone, 

another weaker bound holds (Theorem 14.51) . 

• In ID, we can solve integer 3SUM+ in truly subquadratic time if the input sets are clustered 

in the sense that they can be covered by 'n}~^ intervals of length n (Corollary 14.31) . In fact, just one 
of the sets A needs to be clustered. This is the most general setting of 3SUM we know that can be 
solved in truly subquadratic time (hence, the title of the paper). In some sense, it “explains” all the 
other results. For example, d-dimensional monotone sets, when mapped down to ID in an appropriate 
way, become clustered integer sets. 

• We can also solve a data structure version of 3SUM+ where S is given online: preprocess A and 

B so that we can decide whether any query point s is in 4. + i?. For example, if A and B are 
monotone in [n]*^, we get truly subquadratic expected preprocessing time and truly sublinear 

Q(j^2/3+5(d+i3)/6) quej-y tjjjie foj- any sufficiently small <5 > 0 (Corollary 15.31) . 

• As an immediate application, we can solve the histogram indexing problem for any constant alpha¬ 

bet size d: we can preprocess any string ci • • • G [d]* in truly subquadratic 0(n^~^) expected 
time, so that we can decide whether there is a substring whose vector of character counts matches 
exactly a query vector in truly sublinear time for any sufficiently small d > 0 

(Corollary 15.41) . This answers an open question and improves a recent work by Kociumaka et al. l[24l . 
Furthermore, if n queries are given offline, we can answer all queries in total ('^+^3) ^ expected 
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time (Corollary 14.6b . As d gets large, this upper bound approaches a conditional lower bound recently 
shown by Amir et al. |5). 

• For another intriguing consequence, we can preprocess any universes Aq,Bq, Sq C Z of n integers 
so that given any subsets A C Aq,B C Bq^S C 5o, we can solve 3SUM+ for A,B,S in truly 
subquadratic time (Theorem 16.1b . Remarkably, this is a result about general integer sets. 

One of the results in Bansal and Williams’ paper 0 mentioned precisely this problem but obtained 
much weaker polylogarithmic speedups. When So is not given, we can still achieve 0(n^'®) time 
(Theorem 16.2b . 


2 Ingredients: The BSG Theorem/Corollary and FFT Lemma 

As noted in the introduction, the key ingredient behind all of our results is the Balog-Szemeredi-Gowers 
Theorem. Below, we state the particular version of the theorem we need, which can be found in the papers 
by Balog |]4l and Sudakov et al. ||33l. A complete proof is redescribed in Sections ITT] and 1731 

Theorem 2.1. (BSG Theorem) Let A and B be finite subsets of an abelian group, and G f A x B. 
Suppose that |A||i?| = Q{N‘^), |{a + 6 : (a, 6) G G}| < tN, and |G| > aN'^. Then there exist subsets 
A! tz A and B' C B such that 

(i) |A' + B'\ < 0{{l/aft^N), and 

(ii) \Gn{A’xB')\ > n{a\A'\\B\) > Ll{a'^N^). 

The main case to keep in mind is when \A\ = \B\ = N and t = 1, which is sufficient for many of our 
applications, although the more general “asymmetric” setting does arise in at least two of the applications. 

In some versions of the BSG Theorem, A = B {or A = —B) and we further insist that A! = B' (or 
A' = —B'y, there, the a-dependencies are a bit worse. 

In some simpler versions of the BSG Theorem that appeared in many papers (including the version 
mentioned in the introduction), we are not given G, but rather a set S of size tN with |{(a, 5) : a + 6 G 
<S'}| > aN‘^', in other words, we are considering the special case G = {(o, h) : a + 6 G S}. Condition (ii) is 
replaced by |A'|, \B'\ > Q{aN). For our applications, it is crucial to consider the version with a general G. 
This is because of the need to apply the theorem iteratively. 

If we apply the theorem iteratively, starting with G = {{a, b) : a + 5 G S'} for a given set S, and 
repeatedly removing A' x B' from G, we obtain the following corollary, which is the combinatorial result 
we will actually use in all our applications (and which, to our knowledge, has not been stated explicitly 
before): 

Corollary 2.2. (BSG Corollary) Let A, B, S be finite subsets of an abelian group. Suppose that | A| |i?| = 
0{N‘^) and [S'! < tN. For any a <1, there exist subsets Ai, ..., C A and Bi, ..., Bf^ C. B such that 

(i) the remainder set R = {(a, b) Z A x B : a + b £ S} \ Bi) has size at most aN'^, 

(ii) \Ai + Bi\ < 0{{l/a)^t^N) for each i = 1,... ,k, and 
(hi) k = 0{l/a). 


5 


A naive argument gives only k = 0((l/a)^), as each iteration removes n(Q;^|A||i?|) edges from G, but 
a slightly more refined analysis, given in Section [TAj lowers the bound to k = 0(1/a). 

None of the previous papers on the BSG Theorem addresses the running time of the construction, which 
will of course be important for our algorithmic applications. A polynomial time bound can be easily seen 
from most known proofs of the BSG Theorem, and is already sufficient to yield some nontrivial result for 
bounded monotone (min,+)-convolution and binary histogram indexing. However, a quadratic time bound 
is necessary to get nontrivial results for other applications such as histogram indexing for larger alphabet 
sizes. In Sections IT2] and 1731 we show that the construction in the BSG Theorem/Corollary can indeed be 
done in near quadratic time with randomization, or in matrix multiplication time deterministically. 

Theorem 2.3. In the BSG Corollary, the subsets Ai,..., A^, Bi,..., B^, the remainder set R, and all the 
sumsets Ai + Bi can be constructed by 

(i) a deterministic algorithm in time or more precisely, 

0{{\/a)Ai[a\A\, |A|, |H|)), where Ad(rei, n 2 , ns) is the complexity of multiplying an rii x n 2 and 
an n 2 x res matrix, or 

(ii) a randomized Las Vegas algorithm in expected time 0{N‘^) for t>l, or 0{N‘^ + (l/a)®|A|) other¬ 
wise. 

We need one more ingredient. The BSG Theorem/Corollary produces subsets that have small sumsets. 
The following lemma shows that if the sumset is small, we can compute the sumset efficiently: 

Lemma 2.4. (FFT Lemma) Given sets A,BC [U]'^ of size 0{N)for a constant d with \ A-\- B\ < 0{N), 
and given a set T of size 0{N) which is known to be a superset of A + B, we can compute A-\- B by 

(i) a randomized Las Vegas algorithm in 0{N) expected time, or 

(ii) a deterministic algorithm that runs in 0{N) time after preprocessing T in time for an 

arbitrarily small constant e > 0. 

As the name indicates, the proof of the lemma uses fast Fourier transform. The randomized version 
was proved by Cole and Hariharan lfT4l . who actually obtained a more general result where the superset 
T need not be given: they addressed the problem of computing the (classical) convolution of two sparse 
vectors and presented a Las Vegas algorithm that runs in time sensitive to the number of nonzero entries 
in the output vector; computing the sumset A B can be viewed as an instance of the sparse convolution 
problem and can be solved by their algorithm in 0{\A + B\ log^ N) expected time. Amir et al. @ have 
given a derandomization technique for a related problem (sparse wilcard matching), which can also produce 
a deterministic algorithm for computing A-\- B m the setting when T is given and has been preprocessed, 
but the preprocessing of T requires 0{N‘^) time. 

In Section [H we give self-contained proofs of both the randomized and deterministic versions of the 
FFT Lemma. For the randomized version, we do not need the extra complications of Cole and Hariharan’s 
algorithm, since T is given in our applications. For the deterministic version, we significantly reduce Amir 
et al.’s preprocessing cost to which is of independent interest. 

3 3SUM+ for Monotone Sets in [n]^ 

We say that a set in is monotone (increasing/decreasing) if it can be written as {ai,..., a^} where the 
j-th coordinates of ai,..., form a monotone (increasing/decreasing) sequence for each j = I,... ,d. 
Note that a monotone set in [re]^ can have size at most dn. 


6 


3.1 The Main Algorithm 

Theorem 3.1. Given monotone sets A,B,S C [nY for a constant d, we can solve 3SUM~^ by 

(i) a randomized Las Vegas algorithm in expected time for d = 2, 

0(n(8+^)/i2) = 0{n^-^^^)ford = 3, or more generally, oinV^-d+y/id-iiV+mA-i) for any d, 
or 

(ii) a deterministic algorithm in time for d = 2, for d = 3, for d = 4, 

for d = 5, for d = 6, or for d = 7. 

Proof Divide [n]'^ into 0{{n/i)'^) grid cells of side length i, for some parameter i to be set later. Define 
cell(p) to be a label (in Z'^) of the grid cell containing the point p; more precisely, cell(xi,... ,xf) := 
{[xi /£\,..., [xd/i\). 

We assume that all points (xi,..., Xd) € A satisfy Xj mod £ < £/2 for every j = 1,... ,d', when this is 
true, we say that A is aligned. This is without loss of generality, since A can be decomposed into a constant 
(2'^) number of subsets, each of which is a translated copy of an aligned set, by shifting selected coordinate 
positions by £/2. Similarly, we may assume that B is aligned. By alignedness, the following property holds: 
for any a G A and b^B,s = a + b implies cell(s) = cell(a) + cell(5). 

Our algorithm works as follows: 

Step 0: Apply the BSG Corollary to the sets A* = {cell(a) : a G A}, B* = {cell(6) : b G B}, S* = 
{cell(s) : s G 5}. This produces subsets A^,..., A*^, B^,..., B^ and a remainder set R*. 

Note that |A*|, \B*\, |5*| = 0{nj£) by monotonicity of A, B, S. The parameters in the BSG Corol¬ 
lary are thus N = Q{n/l) and t = 1. Hence, this step takes Ofn/tf) expected time by Theorem l2.3l 

Step 1 : For each {a*,b*) G R*, recursively solve the problem for the sets {a G A : cell(a) = a*}, 
{6 G H : cell(6) = 6*}, {s G 5 : cell(s) = a* + b*}. 

Note that this step creates \R*\ = 0{a{n/i)‘^) recursive calls, where each set lies in a smaller uni¬ 
verse, namely, a translated copy of 

Step 2: For each i = I,... ,k, apply the FFT Lemma to generate {a G A : cell(a) G A*} -|- {6 G H : 
cell(6) G B*}, which is contained in the superset Tj = {s G Z'^ : cell(s) G A* + B*}. Report those 
generated elements that are in S. 

Note that the size of A* + B* is 0{{l/a)^n/i), and so the size of Tj is OifXj (xfnj £ ■ £‘^). As 
k = 0(l/a), this step takes 0{{l/a)^n£'^~^) expected time. 

Correctness is immediate from the BSG Corollary, since {{a*,b*) G A* x B* : a* + b* G 5*} is covered 
by R*U[jl,{AtxB*). 

The expected running time is characterized by the following interesting recurrence: 

T(n) < d{{n/£f) + 0{a{n/£f)T{£) + Ofl/afn£'^-^). 

Note that the reduction to the aligned case increases only the hidden constant factors in the three terms. We 
can see that this recurrence leads to truly subquadratic running time for any constant d —even if we use the 
trivial upper bound T{£) = 0{£‘^) (i.e., don’t use recursion)—^by setting £ and l/a to be some sufficiently 
small powers of n. 
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For example, for d = 2, we ean set £ = and l/a = 


and obtain 


r(n) < 0 (nl- 8586 ) + 0 (^ 1 . 7273 )^(^ 0 . 0707 )^ 


whieh solves to 0(n^ ®^®). 

More preeisely, the reeurrenee solves to T(n) = O(n^) by setting i = and 1/a = for x,y,z 
satisfying the system of equations z = 2(1 — x) = —y + 2(1 — x) + xz = 6y + {I — x) + dx. One ean 
eheek that the solution for z indeed obeys the quadratie equation + (d — 11 ) 2 ; — 2d = 0. 

Alternatively, the deterministie version of the algorithm has running time given by the reeurrenee 

T{n) < 0{{l/a)^{n/i)'^) + 0{a{n/if)T{£) + 0{{l/afn^+^£^-^), 

with y = 0.4651 and u = 2.3729, whieh ean be solved in a similar way. The quadratie equation now 
becomes (6 — y)z‘^ + ((1 + y)d — 13 + i/ + 2iJ,)z — {u + 2y)d = 0. □ 

As d gets large, the exponent in the randomized bound is 2 — 2/(d + 13) — 0(l/d^); however, the 
deterministic bound is subquadratic only for d < 7 using the current matrix multiplication exponents (if 
u = 2, then we would have subquadratic deterministic time for all d). In the remaining applications, we 
will mostly emphasize randomized bounds for the sake of simplicity. 

3.2 Application to the 2D Connected Monotone Case, Bounded Monotone (min,+) Convo¬ 
lution, and Binary Histogram Indexing 

We say that a set in is connected if every two points in the set are connected by a path using only vertices 
from the set and edges of unit Li-length. In the case of connected monotone sets A, B in 2D, we show how 
to compute a complete representation of A + B. 

Corollary 3.2. Given connected monotone increasing sets A,B<Z [n]^, we can compute the boundary 
of A + B, a region bounded by two monotone increasing sets, in 0(n^ ®^®) expected time (or 0(n^ ®®^) 
deterministic time). 

Proof. First we show that A + B is indeed a region bounded by two monotone increasing sets. Define Ik to 
be the set of y-values of A+i? at the vertical line x = k. Then each Ik is a single interval: to see this, express 
Ik as the union of intervals ij/'^ = {y : (i, y) £ A} + {y : (k — i, y) G B} over all i, and just observe that 
each interval overlaps with the next interval as A and B are connected and monotone increasing. 

Since the lower/upper endpoints of Ik are clearly monotone increasing in k, the conclusion follows. 

We reduce the problem to SSUM'*' for three 2D monotone sets. We focus on the lower boundary of 
A + B, i.e., computing the lower endpoint of the interval Ik, denoted by s^, for all k. The upper boundary 
can be computed in a symmetric way. We compute all s^. by a simultaneous binary search in O(logn) 
rounds as follows. 

In round i, divide [2n] into grid intervals of length Suppose that at the beginning of the 

round, we know which grid interval Jk contains Sk for each k. Let ruk be the midpoint of Jk. Form the 
set S = {{k,mk) : k G [2n]}. Since the s^’s are monotone increasing, we know that the Ik’s and m^’s 
are as well; hence. S' is a monotone increasing set in [2n]^. Apply the SSUM"^ algorithm to A,B, S. If 
{k, rrik) is found to be in A + B, then Ik contains rrik and so we know that Sk < Otherwise, Ik is either 
completely smaller than nik or completely larger than we can tell which is the case by just comparing 
any one element of Ik with m^, and so we know whether Sk is smaller or larger than nik. (We can easily 



pick out one element from Ik by picking any i in the x-range of A and j in the x-range of B with i + j = k, 
picking any point of ^4 at x = i and any point of S at x = j, and summing their y-coordinates.) We can 
now reset Jk to the half of the interval that we know contains ruk, and proceed to the next round. The total 
running time is that of the 3SUM''" algorithm multiplied by 0(log n). □ 

It is possible to modify the algorithm in Theorem 13.11 directly to prove the corollary and avoid the extra 
logarithmic penalty, but the preceding black-box reduction is nevertheless worth noting. 

Corollary 3.3. Given two monotone increasing sequences ao...,an-i £ [0(n)] and ho,...,hn-i £ 
[0(n)], we can compute their (min,+) convolution in expected time (or determinis¬ 

tic time). 

Proof. We just apply Corollarv l3.2l to the connected monotone increasing sets A = {(i, a) : Oi < a < Oj+i} 
and B = {{i,h) : bi < b < 6i+i} in [0(n)]^. Then the lowest y-value in A B at x = k gives the k-th 
entry of the (min,-i-) convolution. □ 

Remark 3.4. The problems in the preceding two corollaries are in fact equivalent. To reduce in the other 
direction, given connected monotone increasing sets A,BG [n]^, first we may assume that the x-ranges of 
A and B have the same length, since we can prepend one of the sets with a horizontal line segment without 
affecting the lower boundary of A-\- B. By translation, we may assume that the x-ranges of A and B are 
identical and start with 0. We define the monotone increasing sequences a' = (lowest y-value of ^4 at x = i) 
and 6' = (lowest y-value of B at x = i). Then the (min,-i-) convolution of the two sequences gives the lower 
boundary of A-\- B. The upper boundary can be computed in a symmetric way. 

Remark 3.5. We can now compute the (min,-i-) convolution of two integer sequences with bounded differ¬ 
ences property, by reducing to the monotone case as noted in the introduction. 

This version is also equivalent. To reduce in the other direction, given connected monotone increasing 
sets A = {ai,.. . ,a|yi|} and B = {bi,.. . ,6|b|} in [nf where Oi+i - ai,bi+i - h e {(1,0), (0,1)}, we 
apply the linear transformation ^{x, y) = [x y, y). After the transformation, (/)(ai+i) — (j){ai), (/>(6i+i) — 
<t>{bi) G {(1, 0), (1,1)}. When applying the same reduction in Remark [T4l to the transformed sets cj){A) 
and (j){B), the two resulting monotone increasing sequences will satisfy the bounded differences property 
(the differences of consecutive elements are all in {0,1}). The boundary of A-\- B can be inferred from the 
boundary of -|- (i){B). 

Corollary 3.6. Given a string ci • • • c„ G {0,1}*, we can preprocess in expected time (or 

^(^1.864) ^gigfjfiijiistic time) into an 0(n)-space structure, so that we can answer histogram queries, i.e., 
decide whether there exists a substring with exactly f OT and j 1 ’s for any query values i,j, in 0(1) time. 

Proof. One reduction to bounded monotone (min,-i-) convolution has been noted briefly in the introduc¬ 
tion. Alternatively, we can just apply Corollary 13.21 to the connected monotone increasing sets A = 
{ao,... ,a„} C [n]^ and B = —A, where the x- and y-coordinates of Oj are the number of O’s and Ts 
in the prefix ci • • • q. (We can make B lie in by translation.) Then aj — Oi ^ A-\- B gives the number of 
O’s and Ts in the substring Cj • • • Cj-i for any i < j. The boundary of A-\-B gives the desired structure. □ 

Remark 3.7, This problem is also equivalent. To reduce in the other direction, suppose we have connected 
monotone increasing sets A = {ai,...,a|^|} and B = {hi,... ,b\Q\] in [n\^, given in sorted order with 
ai = hi = (0,0). We set q = 0 if Oj+i — a, = (1,0), or 1 if Oj+i — a* = (0,1); and set d* = 0 if 
— bi = (1,0), or 1 if bj+i — bi = (0,1). We then solve histogram indexing for the binary string 
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c\A\-i • • • ciO’^di • • • The minimum number of I’s over all substrings with n + k O’s (which can 

be found in O(logn) queries by binary search) gives us the lowest point in A + B at x = k. The upper 
boundary can be computed in a symmetric way. 

4 Generalization to Clustered Sets 

In the main algorithm of the previous section, monotonicity is convenient but not essential. In this section, 
we identify the sole property needed: clusterability. Formally, we say that a set in is {K, L)-clustered 
if it can be covered by K disjoint hypercubes each of volume L. We say that it is {K, L, M)-clustered if 
furthermore each such hypercube contains at most M points of the set. 

4.1 The Main Algorithm 

Theorem 4.1, Given {Ka, L, Ma)-, {Kb, L, Mb)-, and {Ks, L, Ms)-clustered sets A, B, S C lA-for a 
constant d, we can solve 3SUM~^ in expected time 

d{KAKB + {KaKb?/'' Kf + KA{KBWf/^), 

where W = min{M^M^, M^M^, M 5 M 5 }. 

Proof The algorithm is similar to the one in Theorem 13.11 except without recursion. We use a grid of side 
length i := and as before, we may assume that A and B are aligned. 

Step 0: Apply the BSG Corollary to the sets A* = {cell(a) : a € A}, B* = {cell( 6 ) : b G B}, S* = 
{cell(s) : s G 5}. This produces subsets A^, ..., B^,... ,B^ and a remainder set R*. 

Note that |A*| = 0{Ka), \B*\ = 0{Kb), 15*| = 0{Ks). The parameters in the BSG Corollary 
are thus N = Q{y/KAKB) and t = 0{Ks/s/K aKb)- This step takes 0{KaKb + {l/a)^KA) 
expected time by Theorem [231 

Step 1: For each (a*, h*) G R*, solve the problem for the sets {a G A : cell(a) = a*}, {6 G B : cell(6) = 
b*}, {seS : cell(s) = a* + b*}. 

Note that the three sets have sizes 0{Ma), 0{Mb), 0{Ms), and so the naive brute-force algorithm 
which tries all pairs from two of the three sets takes 0{W) time. As |i?*| = 0{aN‘^) = 0{aKAKB), 
this step takes total time 0{aKAKBW). 

Step 2: For each i = 1,..., /c, apply the FFT Lemma to generate {a G A : cell(a) £ A*} {b £ B : 
cell( 6 ) G B*}, which is contained in the superset R = {s G : cell(s) G A* + B*}. Report those 
generated elements that are in S. 

Note that the size of A* + B* is 0{{l/a)^t^N), and so the size of R is 0{{1/a)^t^NL). As 
k = 0(l/a), this step takes expected time 0{{l/a)^t^NL) = 0{{1/a)^K'gL/{K aKb))- 

The total expected time is 0{KaKb + uKaKbW + {1 /a)^K^L/{K aKb) + (l/a)^iT^). We set 
1/a = uim{[{KAKB)‘^W/{KlL)Yl\ {KbW)^/^]. □ 

It turns out that the M bounds on the number of points per cluster are not essential, and neither is 
clusterability of the third set S. In fact, clusterability of only one set A is enough to obtain nontrivial results. 
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Corollary 4.2. Given sets A,B,S C Z'^ of size 0(n) for a constant d where A and B are {Ka, L)- and 
(KB, L)-clustered, we can solve 3SUM^ in expected time 

d{KAKB + n^^/\KALf/’'). 

Proof We say that a set of size n is equitably {K, L)-clustered if it is {K, L, ©(n/Zf))-clustered. Suppose 
that A, B, S are equitably {Ka, L)-, {Kb, L)-, and {Ks, Z)-clustered. Then in Theorem 14.1[ we set Ma = 
0{n/K a), Mb = 0{n/K b), Ms = 0{n/Ks), and upper-bound W in the second term by the following 
weighted geometric mean (with carefully chosen weights): 

{in/KA){n/KB))^^\{n/KA){n/Ks))^^^{{n/KB){n/Ks))^/^ = j {K^l^ K%^ k]I\ 

and we upper-bound W in the third term more simply by {n/KA){n/K b)- This leads to the expression 
d{KAKB + n^’^l\KAKfl^ + n^l^K^l^). The third term is always dominated by the second (since Ka < 
n), and so we get precisely the stated time bound. 

What if A, B, S are not equitably clustered? We can decompose A into 0(log n) equitably (< Ka, L)- 
clustered subsets: just put points in hypercubes with between 2* and 2*+^ points into the i-th subset. We can 
do the same for B and S. The total time increases by at most an 0(log^ n) factor (since the above bound is 
nondecreasing in Ka and Kb and independent of iT^). □ 

The corollary below now follows immediately by just substituting Ka = , L = n, and Kb < n. 

Corollary 4.3. Given sets A,B,S C Z'^ of size 0{n) for a constant d where A is {n^~^, n)-clustered, we 
can solve 3SUM^ in expected time. 

Although it may not give the best quantitive bounds, it describes the most general setting under which 
we know how to solve 3SUM in truly subquadratic time. 

For example, for d = 1, the above corollary generalizes the well-known fact that 3SUM for integer sets 
in can be solved in subquadratic time (by just doing one FFT), and a not-so-well-known fact that 

3SUM for three integer sets where only one set is assumed to be in [n^“^] can still be solved in subquadratic 
time (by doing several FFTs, without requiring additive combinatorics—a simple exercise we leave to the 
reader). 

Remark 4.4. Although we have stated the above results in d dimensions, the one-dimensional case of 
integers contains the essence, since we can map higher-dimensional clustered sets to ID. Specifically, 
consider a grid of side length i := and without loss of generality, assume that A, B C [U]‘^ are 

aligned. We can map each point (xi,..., Xd) to the integer 

d d 

Xj mod £) ■ ^. 

i=i i=i 

If A is {K, L)-clustered, then the mapped set in ID is still {0{K), L)-clustered. By alignedness, 3SUM+ 
solutions are preserved by the mapping. 
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4.2 Application to the Monotone Case and Offline Histogram Queries 

Corollary 4.5. Given sets A, B, S C [nY of size 0{n) for a constant d where A and B are monotone, we 
can solve 3SUM^ in expected time. 

If only A is monotone, we can solve the problem in expected time. 

Proof. A monotone set in [nY is (0(n/£), £'^)-clustered for all i. For example, it is n)-clustered, 

and so by Corollary I4.31 we know that truly subquadratic running time is achievable. For the best quantitive 
bound, we set Ka, Kb = 0{nj£) and L = in Corollary 14.21 and get Oir? + nP‘K We 
set I = to balance the two terms. 

If only A is monotone, we get Oiv? ji + We set I = . □ 

The above bounds are (predictably) slightly worse than in Theorem l3.11 which assumes the monotonicity 
of the third set S. The algorithm there also exploits a stronger “hierarchical” clustering property enjoyed by 
monotone sets (namely, that clusters are themselves clusterable), which allows for recursion. 

Corollary 4.6. Given a string ci • • • c„ G [d]* for a constant alphabet size d and a set S C [nY of 0{n) 
vectors, we can answer histogram queries, i.e., decide whether there exists a substring with exactly iq OT, 

..., and id-i {d — 1) ’s, for all the vectors (io, ..., id-i) G S, in ) total expected time. 

Proof. We just apply the 3SUM+ algorithm in Corollary 14.51 to the connected monotone increasing sets 
A = {oo,..., On} C [nY and B = —A, and the (not necessarily monotone) set S, where the d coordinates 
of Oj hold the number of O’s, ..., (d — l)’s in the prefix ci • • • Cj. Then the d coordinates of Uj — a* £ A + B 
give the number of O’s, ..., (d — l)’s in the substring a* • • • Oj-i for any i < j. □ 

The above upper bound nicely complements the conditional hardness results by Amir et al. O. They 
proved an jpwer bound on the histogram problem under the assumption that integer 3SUM+ 

requires at least time, and an lower bound under a stronger assumption that 3SUM+ 

in [nY requires at least time. (Their results were stated for online queries but hold in the offline 

setting.) On the other hand, if the assumption fails, i.e., integer 3SUM+ turns out to have a truly subquadratic 
algorithm, then there would be a truly subquadratic algorithm for offline histogram queries with an exponent 
independent of d. 

5 Online Queries 

We now show how the same techniques can even be applied to the setting where the points of the third set 
S are not given in advance but arrive online. 

5.1 The Main Algorithm 

Theorem 5.1. Given [Ka, L, Ma)- and {Kb, L, MB)-clustered sets A,BG for a constant d and a 
parameter P, we can preprocess in expected time 

d{KAKB + {KaKbYIYMaMbYI'^L^I'^ I P^l'^ + Ka{KbMaMbY^^) 

into a data structure with 0 {KaKb + KaKbL/P) space, so that we can decide whether any query point 
s is in A + B in 0(min{M^, M^} • P) time. 
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Proof. The approach is similar to our previous algorithms, but with one more idea: dividing into the cases 
of “low popularity” and “high popularity” cells. As before, we use a grid of side length (. := and 

assume that A and B are aligned. 

The preprocessing algorithm works as follows: 

Step 0: Let A* = {cell(a) : a G A} and B* = {cell(6) : b G B}. Place each {a*,b*) ^ A* x B* in 
the bucket for s* = a* + b* . Store all these buckets. Define the popularity of s* to be the number 
of elements in its bucket. Let S* be the set of all s* G with popularity > P. Apply the BSG 
Corollary to A*,B*, S*. 

Note that |A*| = 0{Ka), \B*\ = 0{Kb), and IS*! = 0{KaKb/P), because the total popularity 
over all possible s* is at most 0{KaKb). The parameters in the BSG Corollary are thus N = 
Q{\/ KaKb) and t = |5*|/A^ = 0{y/K aKb /P)- The buckets can be formed in 0{KaKb) time 
and space. This step takes 0{KaKb + (l/a)®iT^) expected time by Theorem 12. 3 1 

Step 1: For each (a*, b*) G R*, generate the list {a G A : cell(a) = a*} + {5 G i? : cell(6) = b*}. 

Note that naively each such list can be generated in 0{MaMb) time. Since \R*\ = 0{aN‘^) = 
0{aKAKB), this step takes total time 0{aKAKBMAMB). 

Step 2: For each i = I... ,k, apply the FFT Lemma to generate the list {a G A : cell(a) G A*} + {6 G 
B : cell(5) G B*}, which is contained in the superset Tj = {s G : cell(s) G A* + B*}. 

Note that the size of A* + B* is 0{{l/a)^t^N), and so the size of R is 0{{l/a)^t^NL). As k = 
0(l/a), this step takes expected time 0{{l/a)^t^NL) = 0{{l/a)^{^/KAKB/PYVKAKBL) = 
0{{l/a)^{KAKByL/P^). 

Step 3: Store the union £ of all the lists generated in Steps 1 and 2. Prune elements not in {s G Z*^ : 
cell(s) G S'*} from £. 

Note that the pruned list C has size at most |5*|L = 0{KaKbL/P). 

The query algorithm for a given point s works as follows: 

“Low” Case: cell(s) has popularity < P. W.l.o.g., assume Ma < Mb- We look up the bucket for cell(s). 
For each (a*, b*) in the bucket, we search for some a & A with cell(a) = a* that satisfies s — a ^ B. 
The search time is 0{Ma) per bucket entry, for a total of 0{MaP). 

“High” Case: cell(s) has popularity > P. We just test s for membership in £ in 0(1) time. 

To summarize, the preprocessing time is 0{KaKb + oKaKbMaMb + {1 /a)^{K aKb)"^L/P^ + 
{l/a)^KA), the space usage is 0{KaKb + KaKbL/P), and the query time is 0{MaP). We set 1/a = 
mm{[{MAMBP^)/{KAKBL)]y\ (KbMaMb)^/^}. □ 

Corollary 5.2. Given {Ka, L)- and {Kb, L)-clustered sets A,B C of size 0{n)for a constant d and a 
parameter Q, we can preprocess in expected time 

d{KAKB + nl2/7(^^^)l/7g3/7) 

into a data structure with 0 {KaKb + KaLQ) space, so that we can decide whether any query element s 
is in A + B in 0{n/Q) time. 
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Proof. Recall the definition of equitable clustering in the proof of Corollary 14.21 Suppose that A, B, S are 
equitably {Ka,L)-, {Kb,L)-, and (^ 5 , L)-clustered. Then in Theorem 15.11 setting Ma = 0{n/KA), 
Mb = 0{n/KB), Ms = 0{n/Ks), and the parameter P = mayi{KA, Kb}/Q > k] 1^jQ, we 
get the desired preprocessing time 0{KaKb + {KaL )^^'^(the last term is always 

dominated by the second), space 0{KaKb + KaLQ), and query time 0{n/Q). 

We can reduce to the equitable case as in the proof of Corollary 14.21 by decomposing each set into 
O (log n) subsets. □ 

5.2 Application to the Monotone Case and Online Histogram Queries 

Corollary 5.3. Given two monotone sets A,BG [n]'^for a constant d and a parameter 5, we can preprocess 

in expected time, so that we can decide whether any query point s is in A + B in 

time. 

If only A is monotone, the query time is 

Proof. A monotone set in [rif is (0(n/£), ^^)-clustered for all C We set Ka, Kb = Oinjl') and L = in 
Corollary 15.21 and get 0{v?jf^ + preprocessing time and OinjQ') query time. We 

set I = and Q = 

If only A is monotone, we get Oir? ji + preprocessing time and OinjQ') query 

time. We set i = and Q = j 7 ,i/ 3 -< 5 (rf+ 6 )/ 3 _ q 

If we want to balance the preprocessing cost with the cost of answering n queries, in the case when 
A and B are both monotone, we can set (i = 2/(d + 19) and obtain preprocessing time 

and query time. These bounds are (predictably) slightly worse than in Corollary 14.51 in the 

offline setting. 

Corollary 5.4. Given a string ci • • • G [d]* for a constant alphabet size d, we can preprocess in Oinf~^) 
expected time, so that we can answer histogram queries, i.e., decide whether there exists a substring with 
exactly zq OT, ..., and id-i [d — 1) ’s, for any query vector (zq, ..., Zd-i). in time. 

Proof. We just apply Corollary 15. 3l to the same sets A and B from the proof of Corollary 14.61 □ 

Remark 5.5. The idea of diyiding into the cases of low and high popularity cells has preyiously been used 
by Kociumaka et al. 1(241 for histogram indexing, but they were able to obtain only a space/query-time 
tradeoff, namely, a data structure with Oin?~^) space and query time. Their data structure 

requires close to quadratic preprocessing time. Incidentally, we can easily improye their space/query-time 
tradeoff: substituting Ka, Kb = Oinjt) and L = Corollary 15.21 giyes Oinf jf^ + space and 

Oin/Q) time. Setting i = and Q = n^-^id-+i )/2 gives 0(n^“'^) space and query 

time. All this does not require additiye combinatorics (which helps only in improying the preprocessing 
time). 

6 3SUM+ in Preprocessed Universes 

As one more application, we show that 3SUM can be solyed in truly subquadratic time if the uniyerse 
has been preprocessed. (Note, though, that the running time below is subquadratic in the size of the three 
giyen uniyerses Aq, Bq, Sq, and not of A, B, S.) This yersion of the problem was considered by Bansal and 
Williams who only obtained time bounds of the form n^/polylog n. 
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Theorem 6.1. Given sets Aq, Bq, 5o C Z of size 0(n), we can preprocess in O(n^) expected time into a 

data structure with space, so that given any sets A C Aq, B C Bq, S C Sq, we can solve 3SUM~^ 

for A, B, S in time. 

Proof Our algorithm works as follows: 

Preprocessing: Apply the BSG Corollary to Aq,Bq,Sq. Store the resulting subsets 
Bi,... ,Bk and remainder set R, and also store each R = Ai + Bi. 

The expected preprocessing time is 0{'nf) by Theorem l2.3l As \R\ < aiif, \Ti\ = 0((l/a)^n), and 
k = 0(l/a), the total space usage is 0{an‘^ + (l/a)®n). 

Now, given A C Aq, B G Bq, S C Sq, we do the following: 

Step 1: For each (a, b) £ R, if a £ A, b £ B, and a + b £ S, then report a + b. This step takes 

0{\R\) = O(an^) time. 

Step 2: For each i = 1,... ,k, apply the FFT Lemma to generate {Air\A) + {Bir\B), which is contained in 
the superset R. Report those generated elements that are in S. This step takes 0((l/a)®n) expected 
time. 

The total expected time is 0{a-nf + (l/a)®n). We set 1/a = to balance the two terms. The part 
after preprocessing can be made deterministic, after including an extra 0{{1/ = o{v?) cost for 
preprocessing the Tj’s for the deterministic version of the FFT Lemma. □ 

In the above theorem, S is superfluous, since we may as well take S = Sq when solving 3SUM'*'. In 
the next theorem, we show that a slightly weaker subquadratic time holds if the universe for S is not given 
in advance. 

Theorem 6.2. Given sets Aq,Bq C Z of size 0{n), we can preprocess in 0{in?) expected time into a data 
structure with 0{n?‘) space, so that given any sets A C Aq, B C Bq, S' C Z of size 0{n), we can solve 
3SUM+ for A, B, S in time 

Proof. We incorporate one more idea: dividing into the cases of low and high popularity. 

Preprocessing: Place each (a, b) £ Aq x Bq in the bucket for s = a + 6. Store all these buckets. Define 
the popularity of s to be the size of its bucket. Let Sq be the set of all elements s £ Z with popularity 
> n/t. Apply the BSG Corollary to Aq, Bq, Sq, store the resulting subsets Ai, ..., A^, Bi, ... ,Bk 
and remainder set R, and also store each T* = A, + Bj. 

Note that |So| = 0{tn), because the total popularity is 0{v?). The buckets can be formed in 0{nf) 
time and space. The expected preprocessing time is 0{v?) by Theorem 12.31 As |i?| < av?, \Ti\ = 
0{{l/aY’t^n), and k = 0(l/a), the total space usage is 0(n^ + (l/a)®f^n). 

Now, given A C Aq, B C Bq, 5 G Z of size 0(n), we do the following: 

Step 0: For each s G 5 of popularity < n/t, we look up the bucket for s, and report s if some (a, b) in the 
bucket has a £ A and b £ B. The search time is 0{n/f) per element in S, for a total of 0{n^/t). 

Step 1: For each (a, b) £ R, if a £ A, b £ B, and a + b £ S, then report a + b. This step takes 
0(|f?|) = 0{an^) time. 
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step 2: For each i = 1,... ,k, apply the FFT Lemma to generate {Ai n A) + {Bi n B), which is contained 
in the superset Ti. Report those generated elements that are in S. This step takes 0{{1/a)^t^n) 
expected time. 

The total expected time is 0(v? jt + art} + (l/a)®t^n). We set t = 1/a = to balance the three 
terms. Again the part after preprocessing can be made deterministic. □ 

Remark 6.3. The above theorem holds for real numbers as well, if we assume an unconventional model of 
computation for the preprocessing algorithm. To reduce the real case to the integer case, first sort Aq + Bq 
and compute the smallest distance 6 between any two elements in Aq + Bq in 0(n'^) time. Divide the real 
line into grid intervals of length 5/4. Without loss of generality, assume that Aq and Bq are aligned. Replace 
each real number x in Aq and Bq with the integer f{x) = [x/(5/4)J. Then for any a E Aq, b E Bq, and 
s E Aq + Bq, s = a + 5 iff f{s) = f{a) + f{b). This reduction however requires the floor function and 
working with potentially very large integers afterwards. 

Corollary 6.4. Given a vertex-weighted graph G = (14, E) with n vertices, we can decide whether there 
exists a (not necessarily induced) copy o/iFi 3 (the star with four nodes) that has total weight exactly equal 
to a given value W in time. 

Proof. Let w(v) denote the weight of v. Preprocess Aq = Bq = {t(;(r;) : u E 14}. Then for each tt E 14, 
we solve 3SUM for A = i? = {tt;(u) : v E Ng{u)} G Aq = Bq and S = W — A — w(u). (We can 
exclude using a number twice or thrice in solving 3SUM by a standard trick of appending each number with 
two or three extra bits.) The n instances of 3SUM can be solved in 0{n}'^) time each, after preprocessing 
in 0(v}) expected time. (The result can be made deterministic, as we can afford to switch to the slower 
0 ((l/a)*^'^®^^n^'^^^®)-time preprocessing algorithm.) □ 

The above “application” is admittedly contrived but demonstrates the potential usefulness of solving 
3SUM in preprocessed universes. (Vassilevska Williams and Williams Il35]l had a more general result for 
counting the number of copies of any constant-size subgraph with a prescribed total vertex weight, but their 
bound is not subcubic for 4-vertex subgraphs.) 

For another application, we can reduce 3SUM for (iT, L, M)-clustered sets to preprocessing a universe 
of size 0{K) and solving O(L^) 3SUM instances. This provides another explanation why subquadratic 
algorithms are possible for certain parameters of clusterability, although the time bounds obtained by this 
indirect approach would not be as good as those from Section |4l 

7 Proof and Time Complexity of the BSG Theorem/Corollary 

In this section, we review one proof of the Balog-Szemeredi-Gowers Theorem, in order to analyze its 
construction time and derive the time bound for the BSG Corollary as given by Theorem l2.31 which has been 
used in all our applications. The proof of BSG theorem we will present is due to Balog |4] and independently 
Sudakov et al. 1(3^ . with minor changes to make it more amenable to algorithmic analysis. The proof is 
based on a combinatorial lemma purely about graphs (Balog and Sudakov et al. gave essentially identical 
proofs of this graph lemma, but the latter described a simpler reduction of the theorem to the lemma). The 
entire proof is short (see Sections ITT] and 1731) . uses only elementary arguments, but has intricate details. 

To obtain the best running time, we incorporate a number of nontrivial additional ideas. For our random¬ 
ized time bound, we need sampling tricks found in sublinear algorithms. For our deterministic time bound, 
we need efficient dynamic updates of matrix products. 
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7.1 A Graph Lemma 

Lemma 7.1. (Graph Lemma) Given a bipartite graph G Q A x B, with |G| > a|yl||i?|, there exist 
A' C A and B' G B such that 

(i) for every a' G A', b' G B', there are 0(Q;^|A||i3|) length-^ paths from a' to b', and 

(ii) \Gn{A'xB')\ > n{a\A'\\B\) > n{a^\A\\B\). 

Proof Let Nq{v) denote the neighborhood of v in graph G. Let deg(j(n) = |AG'(n)|. Let edegG'(ri, n) = 
\Ng{u) n AG'(n)| (the number of eommon neighbors of u and v, or equivalently, the number of length-2 
paths from u to v). The existenee of A' and B' is shown via the following eoneise but elever algorithm. 

Algorithm: 

1: Aq = {a G A ; degg(a) > a\B\/2} 

2: repeat 

3: piek a random b* £ B 

4: A*=AonNG(b*) 

5: BAD* = {(a, a') G A* X A* : edegG(a, a') < a^|S|/2048} 

6: until |A*| > a|A|/4 and |bad*| < a^|A*||A|/256 
7: A' = {a G A* : deg 3 ^jj»(a) < a^|A|/64} 

8: B' = {b e B : degGn(A'xB)(^) — “1^1/4} 

Correctness of (i): Line 6 guarantees that the undireeted graph BAD* with vertex set A* has at most 
a^|A*||A|/256 edges and thus average degree at most a^|A|/128. From line 7 it follows that |A'| > 
|A*|/2 > a|A|/8. 

Fix a' G A' and b' G B'. By line 8, there are > a| A'|/4 > a^| A|/32 vertiees a G A' that are adjaeent to 
b'. By line 7, all but < a^| A|/64 sueh vertiees a satisfy (a, a') 0 BAD*. By line 5, for eaeh sueh a, there are 
> a^|i7|/2048 length-2 paths from a' to a. Thus, there are > (a^|A|/64) • (a^|i7|/2048) = n(a®|A||i?|) 
length-3 paths from a' to b'. 

Correctness of (ii): Since A' C Aq, by line 1, degG(a') > a|i7|/2 for every a' G A' and hence |Gn (A' x 
B)\ > |A'| • (a|.B|/2). From line 8 it follows that |Gn(A' x B')\ > |Gn(A' x B)\- (a|A'|/4) • \B\ > 
a\A'\\B\/A > a^\A\\B\/32. 

The process ends w.h.p.: Line 1 implies \G n (Aq x B)\ > |G| — (a|i7|/2) • |A| > a|A||77|/2. From 
line 4 it follows that 

E,*[|A*|] = ^ E l^oniVG(6*)| = ^|Gn(Aoxi7)| > a|A|/2. 

I I I I 

Line 5 then implies 

E..[|bad-| 1 = y: Pr[a,«'€iVG(!,-)l = 

a,a'GAo: a^a'GAo: 

cdegQ(a,a^)<a^|B|/2048 cdegQ(a,a')<a^|B|/2048 

< |A|2 . = a3|^|V2048. 

\B\ 
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Define Z = a^\A\{\A*\ - a\A\/4) - 256|bad*|. Then Eb*[Z] > a^\A\{a\A\/4) - a^\A\‘^/S = 
a^l^P/8. On the other hand, since we always have Z < Eb*[Z] < a^\A\‘^ Vib*[Z > 0]. Thus, 

Pr6.[Z>0] > (a3|^|2/8)/(a2|yl|2) = 0(a). 

When Z > 0, we have simultaneously |^*| > a|^|/4 and |bad*| < a2|^*||^|/256. Thus, the number 
of iterations in lines 2-6 is 0(1 /q;) w.h.p. □ 

7.2 Time Complexity of the Graph Lemma 

Lemma 7.2. In the Graph Lemma, A' and B' can be constructed by 

• a deterministic algorithm in 0{Ai{\A\, l^j, |i?|)) time; 

• a randomized Monte Carlo algorithm in 0((l/a)^|^'| + {l/a)\B\ + (l/a)®) time (which is correct 
w.h.p.), given the adjacency matrix ofG. 


Proof. 

Deterministic time analysis: An obvious deterministic implementation would try all b* ^ B m lines 2-5 
until we find one that satisfies the test in line 6. For line 4, we can compute |A*| for all h* in 0(|A||i?|) 
total time. For line 5, we can compute |bad*| for all h* as follows: First precompute cdegG'(a, a') for all 
a, a' G A; this takes A4(|A|, |i?|, |A|) time by computing a matrix product XfVi where Xi is the adjacency 
matrix of G and Yi is its transpose. Let BADq = {(a, a') G Aq x Aq : cdegg.(a, o') < a^|i?|/2048}. 
For all a G Aq, b* G B, precompute count(a, b*) = the number of o' with aa' G BADq and o' G Ncib*)', 
this takes A4(|Ao|, |Ao|, |i?|) time by computing a matrix product X2Y2 where X2 is the adjacency matrix 
of BADq and Y2 is the adjacency matrix of G H (Aq x B). Then for all b*, we can compute |bad*| by 
summing count(a, 6 *) over all o G Ncib*). Lastly, lines 6-7 take 0(|A||i3|) time. The total time is 
0(A4(|A|, |i?|, |A|)), since A4(-, •, •) is known to be invariant under permutation of its three arguments. 
This is subcubic in |A| + |S|. 

Randomized time analysis: With Monte Carlo randomization, we now show how to improve the running 
time significantly to near linear in | A| + |i7|, which is sublinear in the size of the input adjacency matrix. To 
achieve sublinear complexity, we modify the algorithm where deg(-) and cdeg(-) are replaced by estimates 
obtained by random sampling. Let <5 > 0 be a sufficiently small constant and N = y^|A||i?|. 

The following fact will be useful: given a random sample R C U oi size log N, for any 

fixed subset X we can estimate |X| ^ |i2 n X| • |f7|/|7?| with additive error 0{5 ■ max{|X|, Q;|f7|}) w.h.p. 
This follows from a Chemoff boundjj In particular, w.h.p., |i?nX| > Q;|i2| implies |2f| > (1 — 0{6))a\U\, 
and |i? n 2f| < Q;|i?| implies |2l | < (1 + 0{6))a\U\. 

In line 1, we draw a random sample i?i C i? of size log N. Then for each aGA,we can 

replace degG(a) by \{b G Ri : {a,b) G G}| • |77|/|72i| with additive error 0(6 ■ max{degG(a), a|77|}) 
w.h.p. This gives Aq in 0((l/a)|A|) time. 

Line 4 takes 0(|A|) time. 

^ Let fi = |A||i?|/|17|. One version of the Chernoff bound states that Pr[||7? PI X| — /r| > S'fj] < ((jjg 

first term of the min occurs when S' < 1, the second when S' > 1). Set S'fi = cS ■ max{^,a|i?|} for an arbitrarily large 
constant c. Then S' > cS and S'p > cSa\R\, implying that rmn{S'^p, > 0(min{c^(5^a|fZ|, C(5a|i?|}) > 0(clog A^). Thus, 
I |i? n Z| — ^1 < 0(S ■ max{/i, a?|f?|}) w.h.p. Finally, multiply both sides by |17|/|i?|. 
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In line 5, we draw another (independent) random sample C B of size (1 /5)^(1/log N. Then for 
each a, a' G A*, we can replace cdegG(a,aO by |{6 G R 5 ■ {a,b), {a',b) G G}| • |i?|/|i? 5 | with additive 
error 0{6 ■ max{cdegQ(a, a'), a^|-B|}) w.h.p. We do not explicitly construct BAD*; rather, we can probe 
any entry of the adjacency matrix of BAD* in 0 ((l/a)^) time. 

In line 6, we draw another random sample Rq C A* x A* of size (1/(5)^ (l/a)^ log A^. Then 
we can replace |bad*| by |{(a, a') G Re ■ (a, a') G BAD*}| • |A*p/|i?6| with additive error 0(5 ■ 
max{|BAD*|, Q!^|^*p}) w.h.p. This takes 0((l/a)^) probes to BAD*, and thus 0((l/a)^) time. 

Recall that the number of iterations for lines 2-6 is 0(1 /q;) w.h.p. Thus, the total time for lines 2-6 is 
0((l/o)|A|+(l/a)«). 

In line 7, we draw another random sample R7 C A* of size (1 /6)'^ (I /a)'^ log N . Then for each 
a G A*, we replace degBAD*(«) by |{a' G R7 : (a,a') G_BAD*}| • |A*|/|i? 7 | with additive error 
0(6 ■ max{degB^p«(a),a!^|^*|}) w.h.p. This takes a total of 0((f/aY\A*\) probes to BAD*, and thus 
d((l/af\A*\) = d((l/af\A'\) time. 

In line 8 , we draw one final random sample R^ C A' of size (l/5)(l/a) log N. Then for each b G 
B, we can replace degc'pi(yi/xB)(6) by |{a G R^ : (a,b) G G}| • |^'|/|i?8| with additive error 0(6 ■ 
max{degc.pi(^/xB)(f)), w.h.p. This takes 0((l/a)\B\) time. 

The overall running time is 0((1/a)\A\ + (1/a)® \A'\ + (l/a)|71| + (1/a)®). Since \ A'\ > fl(a| A|), the 
first term can be dropped. The correctness proofs of (i) and (ii) still go through after adjusting all constant 
factors by ± 0(6), if we make 6 small enough. □ 

(We could slightly improve the a-dependencies in the randomized time bound by incorporating matrix 
multiplication, but they are small enough already that such improvements will not affect the final cost in our 
applications.) 

7.3 Proof of the BSG Theorem 

We claim that the subsets A! and B' from the Graph Lemma already satisfy the conditions stated in the BSG 
Theorem. It suffices to verify condition (i). To this end, let S' = {a + 6 : (a, b) G G} and imagine the 
following process: 

for each A' + B' Ao 

take the lexicographically smallest (o', b') £ A' x B' with c = a' + b' 
for each length-3 path a'bab' G G do 

mark the triple (a' + b, a + b, a + b') G S^ 

end for 
end for 

By (i) in the Graph Lemma, the number of marks is at least n(|^' + B'\ ■ a®|74||i7|). On the other 
hand, observe that each triple (a' + b, a + b, a + b') G is marked at most once, because from the triple, 
c = (a' + b) — (a + b) + (a + b') is determined, from which a' and b' are determined, and from which 
a = (a + b') — b' and b = (a' + b) — a' are determined. Thus, the number of marks is at most \S\^. 

Putting the two together, we get 1^4'+ I < 0((l/a)®|5|^/(|^||i7|)) = 0((l/a)®f^A^). □ 

The running time for the BSG Theorem is thus as given in Lemma FOl 
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7.4 Proof of the BSG Corollary 

Note that although the BSG Corollary statement has |^||-B| = we may assume that |^||-B| = 

0(A^^), sinee we ean ehange parameters to TV = |i?| = 0{N), i = tN/N = Q{t), and a = 

a{N/Nf = Then diV^ = aN"^, and {l/aft^N = 0{{l/aft^N). 

We ean eonstruet the subsets Ai,..., A^, Bi,... ,Bj. and the remainder set R in the BSG Corollary, 
simply by repeated applieations of the BSG Theorem: 

1: Gi = {{a,b) & A X B : a + b ^ S} 

2: for z = 1,2,... do 

3: if |Gj| < aN'^ then set /c = i — 1, ii = G^+i, and return 

4: apply the BSG Theorem to Gi with parameter ai = to get subsets Ai, Bi 

5: Gi+i = Gi \ {Ai X Bi) 

6 : end for 


A naive upper bound on k would be 0((1 /q;)^), sinee Q{a^N‘^) edges are removed in eaeh iteration. 
For a more eareful analysis, observe that 


|G,+i| < \Gi\-n{afN^) = \Gi\ 1 -n 


1^ 

iV2 


whieh implies that 


iV2 ^2 

> 


|G,+i| - |G, 


1 + ff 


1^ 

Ar2 


iV2 


+ fl(l). 


Iterating, we get N'^/\Gk\ > kl{k). Sinee \Gk\ > aN"^, we eonelude that k < 0{l/a). 


□ 


7.5 Time Complexity of the BSG Corollary 

We now analyze the running time for the BSG Corollary. We may assume that \A\ < \B\ without loss of 
generality. We may also assume that t > aAT/|A|, beeause otherwise |{(a, 6) : a+b G 5}| < IS'HAI < aN"^ 
and so the trivial solution with k = 0 would work. We may further assume that N > {l/a)^t^, beeause 
otherwise {l/a)^t^N > iV^, and so the trivial solution with k = I, Ai = A, Bi = B would already 
work. Putting the two assumptions together, we have N > (l/Q;)^(aiV/|A|)^ = (l/a)^(iV/|A|)^ > 
(l/a)^A^/|A|, and so |A| > (1/a)^. ^ 

The following additional faet will be useful: — 0(|2l|). To see this, observe that 

|G,+i| < \G,\-n{a,\Am) = 

whieh implies that 

k 

\Gk\ < ^ < o(^|A|log^) < 0(|A| log(l/a)). 

This faet implies that the total eost of updating the adjaeeney matrix as edges are deleted in line 5 is 
at most 0(X]t=i ^ |Aj||i?|) = 0{N‘^). Furthermore, we ean eonstruet all the sum sets 

Ai + Bi naively, again in total time l^l*! |i?i|) = 0{N‘^). It thus remains to bound the total eost of 

the invoeations to the BSG Theorem in line 4. 
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Deterministic time analysis. For the deterministic version of the algorithm, we can naively upper-bound 
the total time of all k = 0{l/a) iterations by 0{{l/a)M.{\A\, 1^41, |-B|)). We show how to improve the a- 
dependency slightly. To achieve the speedup, we modify the implementation of the deterministic algorithm 
in the Graph Lemma to support dynamic updates in G, namely, deletions of subsets of edges. 

Suppose we delete Ai x Bi from G. All the steps in the algorithm can be redone in at most 0(|A| |i?|) 
time, except for the computation of the products XiYi and X2Y2. As A, x Bi is deleted, Xi undergoes 
changes to \Ai \ rows of Xi. We can compute the change in XiYi by multiplying the change in Xi (an |Aj| x 
\B\ matrix if the all-zero rows are ignored), with the matrix Yi, in Ad(|Aj|, \B\, |A|) time. Now, Yi also 
undergoes changes to | Aj| columns. We can similarly update XiYi under these changes in Ad (| A|, |i?|, | Aj [) 
time. 

The product XiYi itself undergoes changes in |Aj| rows and columns, and so does the next matrix X2. 
Moreover, X 2 undergoes Zi additional row and column deletions where Zi is the number of deletions to Aq. 
Also, Y 2 undergoes |Aj| row changes and Zi row deletions. We can update X 2 Y 2 under changes to |Aj| -|- Zi 
rows m X2 in M.{\Ai\ + Zi,\A\,\B\) time. Next we can update X2Y2 under changes to |Aj| + Zi columns 
in X 2 in Ad(|A|, |Aj| + Zj, |i3|) time. Lastly we can update X 2 Y 2 under changes to |Aj| + Zi rows in I 2 in 
time. 

invariant under permutation of its arguments, and 1^*1 = 

< |A|, since Aq undergoes only deletions. The overall running 


\B\ 


-^( 1 ^ 1 ) 1^*1 + 

Recall that Ad(-,-,-) is 

0(|A|). Moreover, 

time is thus 0 {^\^^M.{\Ai 

0{{l/a)M{a\Al\A\,\B\)). 

According to known upper bounds on rectangular matrix multiplication ll22l[T7l[T^l3^ 


+ 1^1) \B\)) < 


olEh 


tt|A| 


Af(a|A|,|A|,|i?|)) 


Ad(a|A|, |A|, |A|) = O(ai-A3029 |^|2-3729^ ^ Q^^u.5a4y|^|2.a72y^ 


,, 0 . 53491 /( 12 . 3729 \ 


for a|A| S> which is true since |A| > (1/a)^ by assumption. So our time bound is 

0((l/a)Af(«|A|,|A|,|i?|)) < 0((l/a)(|i?|/|A|).Af(«|A|,|A|,|A|)) = O((l/«)0-465i|^|i.3729|5|) ^ 

O((l/a)0-4651iV2-3729). 


Randomized time analysis. For the randomized version of the algorithm, we can bound the total time by 

O + J2{{l/af\Ai\ + {l/a)\B\ + = d{N^ + (1/«)''|A| + {l/af\B\ + (1/a)^). 

The third and fourth terms can be dropped, because they are dominated by the first and second since | A| > 
(1/a)^ by assumption. In the case f > 1, the second term can also be dropped, because it is dominated by 
the first since N > {1 /a)^t^ hy assumption. 

Since we can efficiently check whether the solution is correct, the Monte Carlo algorithm can be con¬ 
verted into a Las Vegas algorithm. This completes the proof of Theorem 12.31 □ 

A gap remains between the deterministic and randomized results. For constant a, we believe it should 
be possible to reduce the deterministic running time in the BSG Corollary to by replacing matrix 

multiplication with FFT computations, but we are currently unable to bound the a-dependency polynomially 
in the time bound (roughly because as we iterate, the graph Gi becomes less and less well-structured). 
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8 Proof of the FFT Lemma 


To complete the last piece of the puzzle, we now supply a proof of the FFT Lemma. Note that although the 
statement of the FFT Lemma has A,B£ [U^, we may assume that d = 1, since we can map each point 
(xi,..., Xd) to an integer Xlj'Li Xi{2Uy-^. 

The idea is to use hash to a smaller universe and then solve the problem on the smaller universe by 
FFT. As our problem involves sumsets, we need a hash function that is “basically” additive. The following 
definition suffices for our purposes: we say fhaf a funcfion h is pseudo-additive if fhere is an associafed 
funcfion h such fhaf h{h{a) + h{b)) = h{a + b) for every a, b. For example, fhe function hp{x) = x mod p 
is pseudo-additive (wifh fhe associafed funcfion hp = hp). 

We do nof need a single perfecl hash funcfion (which would be more fime-consuming fo generate and 
may nof be pseudo-addifive); rafher, if suffices fo have a small number of hash functions fhaf ensure each 
elemenf in T has no collisions wifh respecf fo af leasf one hash funcfion. To be precise, define collide (/i, x) = 
{y G T \ {x} : h{y) = h{x)}. We say fhaf a family T-L of functions is pseudo-perfect for T if for every 
X G r fhere is an /i G wifh |collide(/i, x)| =0. 

New Problem: Consfrucf a family T-L oi k pseudo-addifive functions from [17] fo [0(A^)] fhaf is pseudo- 
perfecf for T. 

Computing A + B, given a pseudo-perfect pseudo-additive family for T. Given such an T-L, we can 
compute A -It B as follows. For each /i G we first compute h{A) -|- h{B) by FFT in 0{N) time and 
obtain h{h{A) -\- h{B)). Then for each s G T, we identify an h ^ % with |collide(/i, s)| = 0, and if 
h{s) G h{h{A) -|- h{B)), we report s. The total time of the whole algorithm is 0{kN) (assuming that 
each h and h can be evaluated in constant time). To prove correctness, just note that for a G A, 6 G B, we 
have h{s) = h{h{a) -I- h{b)) iff h{s) = h{a ->r b) iff s = a -\- b, assuming that |collide(ti, s)| = 0 (since 
a -It b £ A ->r B C T). 

It remains to solve the problem of constructing the hash functions T-L. 

A standard randomized construction of a pseudo-perfect pseudo-additive family. With randomiza¬ 
tion, we can simply pick k = log -|- 1 random primes p G [cN log^ U] for a sufficiently large constant c, 
and put each function hp{x) = x mod p in T-L. 

To see why this works, consider a fixed x G T. For any y £ X \ {x}, the number of primes p with 
y mod p = X mod p is equal to the number of prime divisors of |x — y\ and is at most log U. Since 
the number of primes in [cA^log^ C/] is at least 2A^log U for a sufficiently large c by the prime number 
theorem, Prp[7/modp = xmodp] < 1/(277). Thus, Prp[|collide(/ip, x)| 7 ^ 0] < 1/2. So, Pr[V/ip G 
"H, |collide(/ip, x)| / 0] < 1/2^ < 1/(2A"). Therefore, the overall failure probability isatmostl/2. 

Note that we can compute the numbers |collide(/i, x)| for all x G T for any given hash function in linear 
time after assigning elements to buckets. In particular, we can verify whether the construction is correct in 
0{N) time. We conclude that there is a Las Vegas algorithm with total expected time O(A^). 

A new deterministic construction of a pseudo-perfect pseudo-additive family. An obvious way to 
derandomize the previous method is to try all primes in [cA^log^ U] by brute force, but the running time 
would be at least Q{N'^). Indeed, that was the approach taken by Amir et al. ||3l. We describe a faster 
deterministic solution by replacing a large prime with multiple smaller primes, using hash functions of the 
form (ipi,...,p^(x) = (x mod pi, ..., x mod pi) £ l}. Such a function remains pseudo-additive (with the 
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associated function , X() = (xi mod pi,..., mod p()). The idea is to generate the I 

smaller primes in I separate rounds. The algorithm works as follows: 

1: s = r 

2 : while | 5 | / 0 do 

3: for i = 1 to £ do 

4: pick a prime pi G log^ 17] with 

5: |{x G S' : |collide(/ipj,...,p.,x)| < > |S'|/2* 

6 : end for 

7: put /ipi,...,p^ in Ti, and remove all x with |collide(/ip^^...^p^, x)| = 0 from S 

8 : end while 

Consider the inner for loop. Lines 4—5 take time by brute force. But why does pi always 

exist? Suppose that pi,... ,Pi-i have already been chosen, and imagine that pi is picked at random. Let 
Ci{x) be a shorthand for collide(/ipj^...^p.,x). Consider a fixed x G S' with |C'j_i(x)| < _ For 

any fixed y G T \ {x}, Prpjy modp* = x modpi] < 1/(2A^^/^) by an argumenf we have seen earlier. 
Thus, 


IEpJ|C'i(x)|] = EpJ|{y G C'i_i(x) : y modpi = X modpilll < |C'i_i(x)|/(2iV^/^) < */^/2. 

By Markov’s inequality, PrpJ|C'j(x)| < > 1 / 2 . Since we know from fhe previous iferafion fhaf 

fhere are af leasf |5|/2*“^ elemenfs x G 5 wifh |Ci_i(x)| < we fhen have EpJ|{x G S : 

|C'j(x) < A^^“*/^}|] > |5|/2®. So fhere exisfs pi wifh fhe sfafed property. 

Line 7 fhen removes af leasf |S'|/2^ elemenfs. Hence, fhe number of iferafions in fhe oufer while loop 
is A: < log A^/log( 2 ;^) = 0(2^ log ty”). Each funcfion /ipi,...,p^ maps fo [0(A^^/^)]^, which can eas¬ 
ily be mapped back fo one dimension in [0(A^)] while preserving pseudo-addifivify, for any consfanf 1. 
We conclude fhaf fhere is a deferminisfic algorifhm wifh fofal running fime for an arbifrarily 

large consfanf 1. This gives 0(A^^+^). (More precisely, we can bound fhe fofal deterministic fime by 
O ^ log log U) ^ choosing a nonconsfanf 7.) □ 

Remark 8.1. In fhe above resulfs, fhe O nofafion hides nof only logN buf also log 17 faclors. For many 
of our applicafions, U = and so fhis is nof an issue. Furfhermore, in fhe randomized version, we 

can use a differenf hash funcfion Q fo reduce U fo firsf, before running fhe above algorifhm. In fhe 

deferminisfic version, if seems possible fo lower fhe dependency on U by using recursion. 

Remark 8.2. Our hash funcfion family consfrucfion has ofher applicafions, for example, fo fhe sparse con¬ 
volution problem: given fwo nonnegafive vecfors u and v, compute fheir classical convolution u * v = z 
(where zj- = ttiVk-i) in “oulpuf-sensifive” time, close fo 117 | |, fhe number of nonzeros in z. The prob¬ 
lem was raised by Mufhukrishnan If29l . and previously solved by Cole and Hariharan lUdil wifh a randomized 
Fas Vegas algorifhm in 0(11211) fime. 

Fef A = {a : Ua 0} and B = {b : Vb ^ 0}. Then || 2 || is precisely \A -|- B\. If we are given a 
supersef T of ^4 -I- 7? of size 0(||2||), we can solve fhe problem deterministically using a pseudo-perfecl 
pseudo-additive family T-L as follows: For each h £ T-L, precompute fhe vectors u' and v' of lengfh 0(| jT] |) 
where u'^ = Yja-.h(a)=i ““ '^'3 ^ '^y:h{b)=j '^b- Compute u' * v' = z' in 0(||2||) fime by FFT. Compute 

2" = Ylk-h{k)=e ^'k- Then for each s £T wifh |collide(/i, s)| =0, sef Zg = z'^(^s)' To prove correcfness, firsf 
observe fhaf 2 ^ = Ea,bMa)+hib)=k ^aVb- If |colIide(/i, s)| = 0, fhen 2 "(^) = T.ayh{h{a)+h{b))=h{s) ^<^^b = 

'^a,b:h{a+b)=h{s) ^o,Vb '^a,b:a+b=s ^n^b ^s- 
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Cole and Hariharan’s algorithm is more general and does not require the superset T to be given. It would 
be interesting to obtain a deterministic algorithm that similarly avoids T. 

Remark 8.3. Another application is sparse wildcard matching. The problem is: given a pattern and text that 
are sparse with few non-zeroes, find all alignments where every non-zero pattern element matches the text 
character aligned with it. Sparse wildcard matching has applications, such as subset matching, tree pattern 
matching, and geometric pattern matching; see fT4ll . 

Cardoze and Schulman lIT^ proposed a Monte Carlo near-linear-time algorithm. Cole and Hariha- 
ran ifTdl transformed this into a Las Vegas algorithm. Amir et al. |[3j| considered the indexing version of 
this problem where the text is preprocessed for subsequent pattern queries. In this setting they proposed 
an 0{N‘^) preprocessing time algorithm, where N is the number of non-zeroes in the text. The query 
time is then 0{N). The latter is essentially based upon the construction of a deterministic pseudo-perfect 
pseudo-additive family for T = {f : i is a non-zero text location}. It can be checked that our new determin¬ 
istic solution is applicable here and thus improves their preprocessing time to yielding the first 

quasi-linear-time deterministic algorithm for sparse wildcard matching. 

9 Final Remarks 

We have given the first truly subquadratic algorithms for a variety of problems related to 3SUM. Although 
there is potential for improving the exponents in all our results, the main contribution is that we have broken 
the barrier. 

An obvious direction for improvement would be to reduce the a-dependency in the BSG Theorem itself; 
our work provides more urgency towards this well-studied combinatorial problem. Recently, Schoen 
has announced such an improvement of the BSG Theorem, but it is unclear whether this result will be useful 
for our applications for two reasons: First, the time complexity of this construction needs to be worked out. 
Second, and more importantly, Schoen’s improvement is for a more basic version of the theorem without G, 
and the extension with G is essential to us. 

Here is one specific mafhemafical quesfion fhaf is particularly relevanf fo us: 

Given subsefs A, B, S of an abelian group of size N, we wanf fo cover {(a, b) £ A x B : 
a -|- 6 G S'} by bicliques Ai x Bi, so as fo minimize fhe cosf function ^i\- (There 

is no consfrainf on fhe number of bicliques.) Prove worsf-case bounds on fhe minimum cosf 
achievable as a funcfion of N. 

A bound follows from fhe BSG Corollary, simply by creating exfra “singlefon” bicliques 

fo cover R, and selling a fo minimize 0{aN‘^ + {l/a)^N). An improvemenf on Ihis combinatorial bound 
would have implicafions fo al leasl one of our algorifhmic applications, nolably Theorem 16. II 

We hope lhal our work will inspire further applicalions of additive combinalorics in algorilhms. For 
inslance, we have yel to sludy special cases of fcSUM for larger k', perhaps some mulli-lerm extension of fhe 
BSG Theorem ||9l would be useful fhere. As an extension of bounded monolone (min,-i-) convolution, we 
may also consider (min,-i-) malrix mulliplicalion for fhe case of integers in [n] where fhe rows and columns 
satisfy monolonicily or fhe bounded differences properly. If would be exciting if fhe general integer 3SUM 
or APSP problem could be solved using tools from additive combinatorics. 
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